Reconfigurable Web wrapper agents for biological information integration

نویسندگان

  • Chun-Nan Hsu
  • Chia-Hui Chang
  • Chang-Huain Hsieh
  • Jiann-Jyh Lu
  • Chien-Chi Chang
چکیده

A variety of biological data is transferred and exchanged in overwhelming volumes on the World Wide Web. How to rapidly capture, utilize and integrate the information on the Internet to discover valuable biological knowledge is one of the most critical issues in bioinformatics. Many information integration systems have been proposed for integrating biological data. These systems usually rely on an intermediate software layer called wrappers to access connected information sources. Wrapper construction for Web data sources is often specially hand coded to accommodate the differences between each Web site. However, programming a Web wrapper requires substantial programming skill and is time-consuming and hard to maintain. This paper provides a solution for rapidly building software agents that can serve as Web wrappers for biological information integration. We define an XMLbased language called WNDL, which provides a representation of a Web browsing session. A WNDL script describes how to locate the data, extract the data and combine the data. By executing different WNDL scripts, user can automate virtually all types of Web browsing sessions. We also describe IEPAD, a data extractor based on pattern discovery techniques. IEPAD allows our software agents to automatically discover the extraction rules to extract the contents of a structurally formatted Web page. With a programming-by-example authoring tool, a user can generate a complete Web wrapper agent by browsing the target Web sites. We built a variety of biological applications to demonstrate the feasibility of our approach. The software is available at http://chunnan.iis.sinica.edu.tw/software.html or by contacting the authors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reconfigurable Web Wrapper Agents for Web Information Integration

In this paper, we presented a tool to exploit online Web data sources using reconfigurable Web wrapper agents. We described how these agents can be rapidly generated and executed based on the script language WNDL and extraction rule generator IEPAD. WNDL is an XML-based language that provides a representation of a Web browsing session. A WNDL script describes how to locate the data, extract the...

متن کامل

Reconfigurable Web Wrapper Agents

directly access the data. Web wrappers, however, must automate Web browsing sessions to extract data from the target Web pages so other applications can process that data. Each Web site has its own set of links, layout templates, and syntax. You could, in a brute-force solution, program a wrapper for each browsing session. However, such wrappers are sensitive to Web site changes and become diff...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Biological Data Extraction and Integration — A Research Area Background Study

My research field is highly diverse. It interweaves many different areas in information technology and bioinformatics. The system I propose to implement can automatically locate, understand, and extract online biological data independent of the source and also make it available for Semantic web agents. This research field requires background knowledge from (1) Information Extraction, (2) Schema...

متن کامل

A Web Service Based Framework for Information Integration of the Process Industry Systems

Many process industry subsystems were developed separately. They didn’t collaborate efficiently, which makes it difficult for information integration. In this paper, a web service based framework is presented to address this problem, in which every process industry subsystem is described as a web service by web service wrapper and registered in the UDDI register centre, according to different i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 56  شماره 

صفحات  -

تاریخ انتشار 2005